A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units
نویسنده
چکیده
We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU’s memory hierarchy. The algorithm may outperform MAGMA’s dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU’s shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU’s resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.
منابع مشابه
Spectral Separation of Quantum Dots within Tissue Equivalent Phantom Using Linear Unmixing Methods in Multispectral Fluorescence Reflectance Imaging
Introduction Non-invasive Fluorescent Reflectance Imaging (FRI) is used for accessing physiological and molecular processes in biological media. The aim of this article is to separate the overlapping emission spectra of quantum dots within tissue-equivalent phantom using SVD, Jacobi SVD, and NMF methods in the FRI mode. Materials and Methods In this article, a tissue-like phantom and an optical...
متن کاملA GPU-based hyperbolic SVD algorithm
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms....
متن کاملBatched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based ...
متن کاملGPGPU Performance Tuning – An illustrated example
This tutorial describes some common techniques to improve performance of GPU-based implementations in linear algebra applications. The example presented here is a Jacobi iteration (commonly used as a smoother in multigrid scenarios) on a sparse matrix arising from Finite Element discretizations of standard operators. However, none of that advanced background is neccessary to understand the GPU-...
متن کاملInvestigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Scientific Computing
دوره 37 شماره
صفحات -
تاریخ انتشار 2015